Sentiment analysis, also known as opinion mining, is the process of extracting and analyzing the emotions and attitudes expressed in text data. In recent years, sentiment analysis has gained significant attention in the field of natural language processing, as it provides valuable insights into the subjective opinions and sentiments of individuals or groups towards a particular topic or product. One area where sentiment analysis can be applied is in the analysis of lyrics data, which can reveal the underlying emotions and themes expressed in songs across different genres and cultures. By applying sentiment analysis techniques to lyrics data, researchers and industry professionals can gain a deeper understanding of the emotional impact and cultural significance of music, as well as the social and political contexts in which it is created and consumed. In this context, this project aims to conduct sentiment analysis on a large dataset of lyrics data, in order to explore the emotional content and sentiment patterns in popular music. Unsupervised sentiment analysis is peformed to specifically answer the following questions of interest.
What is the prevailing sentiment of popular songs in various countries?
How do the sentiments expressed in song lyrics vary across different countries?
# IMPORT DEPENDENCIES
import pandas as pd
import numpy as np
import plotly.express as px
from sklearn.feature_extraction.text import CountVectorizer
import nltk
import time
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity
import plotly.express as px
import plotly.graph_objects as go
import plotly.io as pio
import chart_studio
import chart_studio.plotly as py
import chart_studio.tools as tls
import plotly.offline as pyo
pyo.init_notebook_mode()
import nltk
nltk.download('vader_lexicon')
from nltk.sentiment import SentimentIntensityAnalyzer
from textblob import TextBlob
from nrclex import NRCLex
# READ DATA
song_data = pd.read_csv('../Data/merged_finaltop100_revised.csv')
# REMOVE ROWS WITH NULL VALUES
song_data = song_data.dropna()
song_data.head()
| Unnamed: 0 | track_id | artist_names | track_name | source | rank | weeks_on_chart | streams | country | danceability | ... | duration_ms | time_signature | album_release_date | lyrics | lyrics_trans | continent | iso_alpha3 | len_words_orig | len_words_trans | lyrics_clean | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 0 | 0yLdNVWF3Srea0uzk55zFn | Miley Cyrus | Flowers | Columbia | 1 | 5 | 124198 | United Arab Emirates | 0.707 | ... | 200455.0 | 4.0 | 2023-01-13 | We were good, we were gold\nKinda dream that c... | we were good we were gold kinda dream that can... | Asia | ARE | 334 | 334 | good gold dream sell right til build home watc... |
| 1 | 1 | 1Qrg8KqiBpW07V7PNxwwwL | SZA | Kill Bill | Top Dawg Entertainment/RCA Records | 2 | 10 | 106927 | United Arab Emirates | 0.644 | ... | 153947.0 | 4.0 | 2022-12-08 | I'm still a fan even though I was salty\nHate ... | im still a fan even though i was salty hate to... | Asia | ARE | 362 | 362 | fan even though salty hate see broad know happ... |
| 2 | 2 | 6AQbmUe0Qwf5PZnt4HmTXv | PinkPantheress, Ice Spice | Boy's a liar Pt. 2 | Warner Records | 3 | 2 | 83627 | United Arab Emirates | 0.696 | ... | 131013.0 | 4.0 | 2023-02-03 | Take a look inside your heart\nIs there any ro... | take a look inside your heart is there any roo... | Asia | ARE | 372 | 372 | take look inside heart room room would hold br... |
| 3 | 3 | 0WtM2NBVQNNJLh6scP13H8 | Rema, Selena Gomez | Calm Down (with Selena Gomez) | Mavin Records / Jonzing World | 4 | 25 | 79714 | United Arab Emirates | 0.801 | ... | 239318.0 | 4.0 | 2022-08-25 | Vibez\nOh, no\nAnother banger\nBaby, calm down... | vibez oh no another banger baby calm down calm... | Asia | ARE | 495 | 495 | another banger baby calm calm girl body put he... |
| 4 | 4 | 2dHHgzDwk4BJdRwy9uXhTO | Metro Boomin, The Weeknd, 21 Savage | Creepin' (with The Weeknd & 21 Savage) | Republic Records | 5 | 11 | 79488 | United Arab Emirates | 0.715 | ... | 221520.0 | 4.0 | 2022-12-02 | Ooh, ooh-ooh\nOoh-ooh-ooh, ooh, ooh-ooh (Just ... | ooh oohooh oohoohooh ooh oohooh just cant beli... | Asia | ARE | 458 | 456 | believe man want somebody say saw person kiss ... |
5 rows × 30 columns
In order to perform sentiment analysis on lyrics data for this project, a library called VADER, which stands for Valence Aware Dictionary and sEntiment Reasoner, is utilized. VADER is a widely used sentiment analysis tool that employs a lexicon of words and phrases that have been rated for their positive or negative sentiment. It is capable of analyzing text data from various sources, such as social media posts, customer reviews, song lyrics, and news articles, to determine the overall sentiment expressed in the text. VADER considers not only individual words, but also context to accurately evaluate sentiment. Furthermore, it takes into account intensifiers and negations to ensure that the sentiment is interpreted correctly. VADER generates a compound score between -1 (extremely negative) and 1 (extremely positive), making it an efficient and effective method for analyzing large volumes of text data. The following results are obtained using Vader to analyze the sentiment of popular song lyrics.
def get_sentiment_1(text):
'''get sentiment scores of a text'''
sia = SentimentIntensityAnalyzer() #instantiate sentiment analyzer object
#newWords = {'good': 2.0, 'down': 2.0, 'normal': 2.0, 'well': 2.0}
#sid.lexicon.update(newWords) #update words if needed
sentiment_score = sia.polarity_scores(text) #sentiment score of text
return sentiment_score
def get_scores_1(lst):
'''get scores for each text'''
scores_ls = []
for i in lst:
score = get_sentiment_1(i)
scores_ls.append(score)
return scores_ls
%%time
# GET SENTIMENT SCORES FOR EVERY LYRICS
scores_df = pd.DataFrame(get_scores_1(song_data['lyrics_clean']))
scores_df
CPU times: total: 48.9 s Wall time: 50.2 s
| neg | neu | pos | compound | |
|---|---|---|---|---|
| 0 | 0.017 | 0.249 | 0.734 | 0.9996 |
| 1 | 0.161 | 0.401 | 0.438 | 0.9953 |
| 2 | 0.208 | 0.446 | 0.346 | 0.9784 |
| 3 | 0.042 | 0.605 | 0.353 | 0.9970 |
| 4 | 0.084 | 0.654 | 0.261 | 0.9814 |
| ... | ... | ... | ... | ... |
| 6804 | 0.088 | 0.718 | 0.195 | 0.9837 |
| 6805 | 0.139 | 0.747 | 0.114 | -0.2774 |
| 6806 | 0.048 | 0.426 | 0.526 | 0.9991 |
| 6807 | 0.000 | 0.843 | 0.157 | 0.9961 |
| 6808 | 0.129 | 0.805 | 0.066 | -0.8885 |
6809 rows × 4 columns
# ADD TO DATAFRAME
dt = song_data[['country', 'continent', 'lyrics_clean']].reset_index()
df_all = pd.concat([dt, scores_df], axis=1)
df_all
| index | country | continent | lyrics_clean | neg | neu | pos | compound | |
|---|---|---|---|---|---|---|---|---|
| 0 | 0 | United Arab Emirates | Asia | good gold dream sell right til build home watc... | 0.017 | 0.249 | 0.734 | 0.9996 |
| 1 | 1 | United Arab Emirates | Asia | fan even though salty hate see broad know happ... | 0.161 | 0.401 | 0.438 | 0.9953 |
| 2 | 2 | United Arab Emirates | Asia | take look inside heart room room would hold br... | 0.208 | 0.446 | 0.346 | 0.9784 |
| 3 | 3 | United Arab Emirates | Asia | another banger baby calm calm girl body put he... | 0.042 | 0.605 | 0.353 | 0.9970 |
| 4 | 4 | United Arab Emirates | Asia | believe man want somebody say saw person kiss ... | 0.084 | 0.654 | 0.261 | 0.9814 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 6804 | 7295 | South Africa | Africa | cook thing man get high fade pure way hullabal... | 0.088 | 0.718 | 0.195 | 0.9837 |
| 6805 | 7296 | South Africa | Africa | sweet love yeah didnt mean say didnt love tigh... | 0.139 | 0.747 | 0.114 | -0.2774 |
| 6806 | 7297 | South Africa | Africa | first wisdom fear hear child piano first wisdo... | 0.048 | 0.426 | 0.526 | 0.9991 |
| 6807 | 7298 | South Africa | Africa | mother mother mother mother mother mother moth... | 0.000 | 0.843 | 0.157 | 0.9961 |
| 6808 | 7299 | South Africa | Africa | let dude know work go closet go break bone saw... | 0.129 | 0.805 | 0.066 | -0.8885 |
6809 rows × 8 columns
# COMPUTE AVERAGE SENTIMENT SCORES FOR EACH COUNTRY
grouped_df = df_all.groupby('country').mean()[['neg','neu','pos','compound']].reset_index()
grouped_df
C:\Users\kayan\AppData\Local\Temp\ipykernel_30288\3038061562.py:2: FutureWarning: The default value of numeric_only in DataFrameGroupBy.mean is deprecated. In a future version, numeric_only will default to False. Either specify numeric_only or select only columns which should be valid for the function.
| country | neg | neu | pos | compound | |
|---|---|---|---|---|---|
| 0 | Argentina | 0.163455 | 0.563202 | 0.273343 | 0.522222 |
| 1 | Australia | 0.141630 | 0.580580 | 0.277740 | 0.537319 |
| 2 | Austria | 0.147273 | 0.555970 | 0.296677 | 0.572676 |
| 3 | Belarus | 0.193768 | 0.543474 | 0.262789 | 0.331445 |
| 4 | Belgium | 0.146310 | 0.580670 | 0.272930 | 0.424486 |
| ... | ... | ... | ... | ... | ... |
| 68 | United Arab Emirates | 0.132875 | 0.589146 | 0.277885 | 0.570010 |
| 69 | United Kingdom | 0.146571 | 0.579173 | 0.274194 | 0.518237 |
| 70 | Uruguay | 0.158385 | 0.563583 | 0.278042 | 0.565343 |
| 71 | Venezuela | 0.149444 | 0.572525 | 0.278030 | 0.490106 |
| 72 | Vietnam | 0.141152 | 0.534359 | 0.324511 | 0.708450 |
73 rows × 5 columns
# PLOT
pd.options.plotting.backend = "plotly"
grouped_df.plot.bar(y='country', x=['neg','neu','pos','compound'],
title = 'Average sentiment scores by country',
template = 'plotly_dark')
Figure 4.1
Another python library used to perform sentiment analysis on lyrics data is TextBlob, which provides a simple and intuitive interface for performing sentiment analysis on text data. Unlike Vader, it uses a combination of pattern recognition and machine learning techniques to evaluate the sentiment of a given piece of text. TextBlob analyzes the text input, breaking it down into individual words and evaluating their polarity, or positive or negative sentiment. It also takes into account the context of the words, as well as any modifiers or intensifiers that might influence their sentiment. TextBlob generates a sentiment score for the entire text, ranging from -1 (extremely negative) to 1 (extremely positive), as well as a subjectivity score, indicating the degree to which the text expresses an opinion versus being factual.
def get_sentiment_2(text):
'''get sentiment scores of a text'''
blob = TextBlob(text) #instantiate sentiment analyzer object
sentiment_score = blob.sentiment #sentiment score of text
dic = {'polarity': sentiment_score.polarity, 'subjectivity': sentiment_score.subjectivity}
return dic
def get_scores_2(lst):
'''get scores for each text'''
scores_ls = []
for i in lst:
score = get_sentiment_2(i)
scores_ls.append(score)
return scores_ls
%%time
# GET SENTIMENT SCORES FOR EVERY LYRICS
scores_df1 = pd.DataFrame(get_scores_2(song_data['lyrics_clean']))
df_all1 = pd.concat([df_all, scores_df1], axis=1)
df_all1
CPU times: total: 4.83 s Wall time: 4.87 s
| index | country | continent | lyrics_clean | neg | neu | pos | compound | polarity | subjectivity | |
|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 0 | United Arab Emirates | Asia | good gold dream sell right til build home watc... | 0.017 | 0.249 | 0.734 | 0.9996 | 0.499696 | 0.549696 |
| 1 | 1 | United Arab Emirates | Asia | fan even though salty hate see broad know happ... | 0.161 | 0.401 | 0.438 | 0.9953 | 0.270764 | 0.394594 |
| 2 | 2 | United Arab Emirates | Asia | take look inside heart room room would hold br... | 0.208 | 0.446 | 0.346 | 0.9784 | 0.439015 | 0.563258 |
| 3 | 3 | United Arab Emirates | Asia | another banger baby calm calm girl body put he... | 0.042 | 0.605 | 0.353 | 0.9970 | 0.141603 | 0.504190 |
| 4 | 4 | United Arab Emirates | Asia | believe man want somebody say saw person kiss ... | 0.084 | 0.654 | 0.261 | 0.9814 | 0.152885 | 0.368269 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 6804 | 7295 | South Africa | Africa | cook thing man get high fade pure way hullabal... | 0.088 | 0.718 | 0.195 | 0.9837 | 0.163435 | 0.400835 |
| 6805 | 7296 | South Africa | Africa | sweet love yeah didnt mean say didnt love tigh... | 0.139 | 0.747 | 0.114 | -0.2774 | 0.010605 | 0.311858 |
| 6806 | 7297 | South Africa | Africa | first wisdom fear hear child piano first wisdo... | 0.048 | 0.426 | 0.526 | 0.9991 | 0.500275 | 0.801515 |
| 6807 | 7298 | South Africa | Africa | mother mother mother mother mother mother moth... | 0.000 | 0.843 | 0.157 | 0.9961 | 0.200000 | 1.000000 |
| 6808 | 7299 | South Africa | Africa | let dude know work go closet go break bone saw... | 0.129 | 0.805 | 0.066 | -0.8885 | -0.221528 | 0.622917 |
6809 rows × 10 columns
#c COMPUTE AVERAGE SCORES FOR EVERY COUNTRY
grouped_df = df_all1.groupby('country').mean()[['neg','neu','pos','compound',
'polarity','subjectivity']].reset_index()
grouped_df
C:\Users\kayan\AppData\Local\Temp\ipykernel_30288\2918057648.py:2: FutureWarning: The default value of numeric_only in DataFrameGroupBy.mean is deprecated. In a future version, numeric_only will default to False. Either specify numeric_only or select only columns which should be valid for the function.
| country | neg | neu | pos | compound | polarity | subjectivity | |
|---|---|---|---|---|---|---|---|
| 0 | Argentina | 0.163455 | 0.563202 | 0.273343 | 0.522222 | 0.104611 | 0.527113 |
| 1 | Australia | 0.141630 | 0.580580 | 0.277740 | 0.537319 | 0.144419 | 0.501962 |
| 2 | Austria | 0.147273 | 0.555970 | 0.296677 | 0.572676 | 0.119142 | 0.500247 |
| 3 | Belarus | 0.193768 | 0.543474 | 0.262789 | 0.331445 | 0.104841 | 0.513782 |
| 4 | Belgium | 0.146310 | 0.580670 | 0.272930 | 0.424486 | 0.128526 | 0.514742 |
| ... | ... | ... | ... | ... | ... | ... | ... |
| 68 | United Arab Emirates | 0.132875 | 0.589146 | 0.277885 | 0.570010 | 0.165916 | 0.491276 |
| 69 | United Kingdom | 0.146571 | 0.579173 | 0.274194 | 0.518237 | 0.115046 | 0.489386 |
| 70 | Uruguay | 0.158385 | 0.563583 | 0.278042 | 0.565343 | 0.110547 | 0.533040 |
| 71 | Venezuela | 0.149444 | 0.572525 | 0.278030 | 0.490106 | 0.109068 | 0.518744 |
| 72 | Vietnam | 0.141152 | 0.534359 | 0.324511 | 0.708450 | 0.163098 | 0.514426 |
73 rows × 7 columns
# PLOT
pd.options.plotting.backend = "plotly"
grouped_df.plot.bar(y='country', x=['polarity','subjectivity'],
title = 'Average scores by country', template = 'plotly_dark')
Figure 4.2
In this plot, Pakistan's polarity bar is not visible due to its exceptionally low average polarity score of 0.000327.
# GET ISO-3 CODES FOR EACH COUNTRY
iso = song_data[['country','continent', 'iso_alpha3']].drop_duplicates()
merged_df = pd.merge(grouped_df, iso, on='country')
merged_df
| country | neg | neu | pos | compound | polarity | subjectivity | continent | iso_alpha3 | |
|---|---|---|---|---|---|---|---|---|---|
| 0 | Argentina | 0.163455 | 0.563202 | 0.273343 | 0.522222 | 0.104611 | 0.527113 | South America | ARG |
| 1 | Australia | 0.141630 | 0.580580 | 0.277740 | 0.537319 | 0.144419 | 0.501962 | Oceania | AUS |
| 2 | Austria | 0.147273 | 0.555970 | 0.296677 | 0.572676 | 0.119142 | 0.500247 | Europe | AUT |
| 3 | Belarus | 0.193768 | 0.543474 | 0.262789 | 0.331445 | 0.104841 | 0.513782 | Europe | BLR |
| 4 | Belgium | 0.146310 | 0.580670 | 0.272930 | 0.424486 | 0.128526 | 0.514742 | Europe | BEL |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 68 | United Arab Emirates | 0.132875 | 0.589146 | 0.277885 | 0.570010 | 0.165916 | 0.491276 | Asia | ARE |
| 69 | United Kingdom | 0.146571 | 0.579173 | 0.274194 | 0.518237 | 0.115046 | 0.489386 | Europe | GBR |
| 70 | Uruguay | 0.158385 | 0.563583 | 0.278042 | 0.565343 | 0.110547 | 0.533040 | South America | URY |
| 71 | Venezuela | 0.149444 | 0.572525 | 0.278030 | 0.490106 | 0.109068 | 0.518744 | South America | VEN |
| 72 | Vietnam | 0.141152 | 0.534359 | 0.324511 | 0.708450 | 0.163098 | 0.514426 | Asia | VNM |
73 rows × 9 columns
# TOP AND BOTTOM 5 COMPOUND SCORES
merged_df[['country', 'compound']].sort_values('compound',ascending =False)
| country | compound | |
|---|---|---|
| 34 | Japan | 0.779608 |
| 59 | South Korea | 0.760777 |
| 26 | Hong Kong | 0.716759 |
| 72 | Vietnam | 0.708450 |
| 22 | Germany | 0.682694 |
| ... | ... | ... |
| 54 | Romania | 0.337070 |
| 3 | Belarus | 0.331445 |
| 15 | Dominican Republic | 0.316231 |
| 67 | Ukraine | 0.299052 |
| 65 | Turkey | 0.232454 |
73 rows × 2 columns
# PLOT RESULTS
fig = px.scatter_geo(merged_df, locations='iso_alpha3', color="compound",
hover_name="country", size="compound",
title='Average sentiment score (compound) by country' )
fig.update_geos(showcoastlines=True, coastlinecolor="white",
showocean=True, oceancolor="black")
fig.show()
Figure 4.3
The average compound score appears to differ among various countries, with Japan, South Korea, Hong Kong, Vietnam, and Germany having the highest average compound scores. This suggests that people in these countries prefer to listen to songs with predominantly positive sentiments. Conversely, Turkey and Ukraine have the lowest average compound score. While the low average sentiment score in Ukraine may be attributed to the ongoing war in their country, there may be other political and social issues in Turkey that lead its people to favor songs with less positive themes.
# TOP AND BOTTOM 5 POLARITY SCORES
merged_df[['country', 'polarity']].sort_values('polarity',ascending =False)
| country | polarity | |
|---|---|---|
| 30 | Indonesia | 0.191214 |
| 39 | Malaysia | 0.174983 |
| 64 | Thailand | 0.168123 |
| 68 | United Arab Emirates | 0.165916 |
| 72 | Vietnam | 0.163098 |
| ... | ... | ... |
| 23 | Greece | 0.078557 |
| 15 | Dominican Republic | 0.063725 |
| 65 | Turkey | 0.055759 |
| 29 | India | 0.052369 |
| 47 | Pakistan | 0.000327 |
73 rows × 2 columns
# PLOT RESULTS
fig1 = px.scatter_geo(merged_df, locations='iso_alpha3', color="polarity",
hover_name="country", size="polarity",
title='Average sentiment score (polarity) by country' )
fig1.update_geos(showcoastlines=True, coastlinecolor="white",
showocean=True, oceancolor="black")
fig1.show()
Figure 4.4
Similar to the average compound score plot in Figure 4.3, the average polarity scores for different countries also vary. In Figure 4.2 above, Pakistan's polarity bar is not visible due to its exceptionally low average polarity score of 0.000327. Other countries with low sentiment or polarity scores using TextBlob include Turkey, India, Dominican Republic, and Greece. Conversely, Indonesia, Malaysia, Thailand, Vietnam, and UAE have the highest average polarity scores.
# TOP AND BOTTOM 5 SUBJECTIVITY SCORES
merged_df[['country', 'subjectivity']].sort_values('subjectivity',ascending =False)
| country | subjectivity | |
|---|---|---|
| 53 | Portugal | 0.541602 |
| 9 | Chile | 0.540034 |
| 6 | Brazil | 0.538437 |
| 64 | Thailand | 0.537607 |
| 48 | Panama | 0.537318 |
| ... | ... | ... |
| 51 | Philippines | 0.481905 |
| 55 | Saudi Arabia | 0.480588 |
| 23 | Greece | 0.473276 |
| 45 | Nigeria | 0.470287 |
| 29 | India | 0.460736 |
73 rows × 2 columns
# PLOT RESULTS
fig1 = px.scatter_geo(merged_df, locations='iso_alpha3', color="subjectivity",
hover_name="country", size="subjectivity",
title='Average subjectivity by country' )
fig1.update_geos(showcoastlines=True, coastlinecolor="white",
showocean=True, oceancolor="black")
fig1.show()
Figure 4.5
As for subjectivity, Portugal, Chile, Brazil, Thailand, and Panama have the highest average subjectivity scores among the countries analyzed. This means that these countries tend to listen to songs that likely contain subjective language and expressions of personal feelings or beliefs. In contrast, Philippines, Saudi Arabia, Greece, Nigeria, and India have the lowest subjectivity scores which means that they listen to songs with lyrics that are more objective and fact-based.
NRC Emotion Lexicon, or NRClex, is another lexicon-based approach to sentiment analysis that focuses on identifying and quantifying the emotional content of text. Developed by the National Research Council of Canada, NRClex assigns a score for each of eight basic emotions - anger, anticipation, disgust, fear, joy, sadness, surprise, and trust - as well as two additional sentiments - negative and positive. The lexicon is built based on a set of over 27,000 English words and their associations with each emotion, and has been expanded to include words from other languages. One advantage of NRClex is its ability to detect nuanced emotional expressions, making it a useful tool for understanding how individuals feel about a particular topic or product. However, its effectiveness may be limited in cases where a text contains sarcasm, irony, or other forms of indirect speech. For this project, NRCLex is used to identify how the popular song lyrics are distributed emotion-wise.
def get_emotion(ls):
'''get dominant emotions'''
emotions =[]
for i in ls:
text_obj = NRCLex(i)
#print(text_object.raw_emotion_scores)
emotion= text_obj.affect_frequencies
max_key =max(emotion, key=emotion.get)
emotions.append(max_key)
return emotions
%%time
# GET EMOTIONS
em = get_emotion(song_data['lyrics_clean'])
len(em)
CPU times: total: 4.91 s Wall time: 4.94 s
6809
# CONVERT TO DATAFRAME
df_all1['emotions'] = em
df_all1.head()
| index | country | continent | lyrics_clean | neg | neu | pos | compound | polarity | subjectivity | emotions | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 0 | United Arab Emirates | Asia | good gold dream sell right til build home watc... | 0.017 | 0.249 | 0.734 | 0.9996 | 0.499696 | 0.549696 | positive |
| 1 | 1 | United Arab Emirates | Asia | fan even though salty hate see broad know happ... | 0.161 | 0.401 | 0.438 | 0.9953 | 0.270764 | 0.394594 | positive |
| 2 | 2 | United Arab Emirates | Asia | take look inside heart room room would hold br... | 0.208 | 0.446 | 0.346 | 0.9784 | 0.439015 | 0.563258 | negative |
| 3 | 3 | United Arab Emirates | Asia | another banger baby calm calm girl body put he... | 0.042 | 0.605 | 0.353 | 0.9970 | 0.141603 | 0.504190 | positive |
| 4 | 4 | United Arab Emirates | Asia | believe man want somebody say saw person kiss ... | 0.084 | 0.654 | 0.261 | 0.9814 | 0.152885 | 0.368269 | positive |
# PLOT RESULTS
fig = px.histogram(df_all1, x="emotions", color = 'emotions',
color_discrete_sequence=px.colors.qualitative.Dark2,
title = 'Emotion-wise distribution of lyrics', template = 'plotly_dark')
fig.show()
Figure 4.6
Upon examining this graph, it becomes apparent that popular songs predominantly express positive emotions and a relatively small number of song lyrics convey feelings of anger. This is useful because it may enable us to draw inferences about the collective emotions of music listeners and identify any global social or political issues that may be contributing to this trend.
# PLOT EMOTIONS BY REGION
fig = px.histogram(df_all1, x="emotions", color = 'continent',
color_discrete_sequence=px.colors.qualitative.Pastel,
title = 'Regional lyrics emotions', template = 'plotly_dark')
fig.show()
Figure 4.7
It appears that the most songs with positve emotions are popular in European and Asian countries.